-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure socket path is < 104 characters when STATE_PATH is set. #4909
Conversation
This pull request does not have a backport label. Could you fix it @belimawr? 🙏
NOTE: |
3859da7
to
8679507
Compare
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
The changes are covered by integration tests. |
This commit fixes the implementation on Windows and puts the container integration tests in the container group again. The test for long state paths will create files outside of the test directory. It also cleans up the default folder used as state path.
6dcc963
to
e46948b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have a couple of nitpicks
Also if the STATE_PATH is too long and the default is used, would that be clear in diagnostics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, no major blockers, Just a nitpick on a direct cast for the agent cmd in test.
While I understand that this may not work for everyone, I feel that the state path should be mounted on a well known location as a local persistent volume or some other volume stored on the host that should have the same lifetime as the deployment: having the STATE_PATH
env var does not feel very container/cloud-native..
Co-authored-by: Michel Laterman <[email protected]>
expectedStatePath: "/tmp/tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt", | ||
expectedSocketPath: "/tmp/elastic-agent/oKUUJbxrLlGSh3z6wZWYleLeMuUN4P0_.sock", | ||
}, | ||
"long path": { // Path too long to create a unix socket, it will use /tmp/elastic-agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't most of these test cases just tests of the SocketURLWithFallback()
function? How long does this new set of tests take to run?
Do you get the same effective test coverage if you update the single existing container command to use a path longer than 107 characters as the exception path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't most of these test cases just tests of the SocketURLWithFallback() function? How long does this new set of tests take to run?
No, they're testing the https://github.com/elastic/elastic-agent/blob/a6d91b029ae8c0b8ccf7bb391c8ac304fe1e9368/internal/pkg/agent/cmd/container.go#L773-L837
function and how it decides to create the socket path. The fact that it calls SocketURLWithFallback
is an implementation detail. It also needs to create some folders:
elastic-agent/internal/pkg/agent/cmd/container.go
Lines 784 to 788 in a6d91b0
if _, err := os.Stat(configPath); errors.Is(err, fs.ErrNotExist) { if err := os.MkdirAll(configPath, 0755); err != nil { return fmt.Errorf("cannot create folders for config path '%s': %w", configPath, err) } } elastic-agent/internal/pkg/agent/cmd/container.go
Lines 794 to 796 in a6d91b0
if err := os.MkdirAll(topPath, 0755); err != nil { return fmt.Errorf("preparing STATE_PATH(%s) failed: %w", statePath, err) }
All of this gets tested by the integration test.
Maybe we can remove one test case: 107 characters path
as it essentially is the same as small path
.
Once this compiles, make sure the |
It works:
|
Quality Gate failedFailed conditions |
Everything works except he sonar unit code coverage which is superseded by the integration tests hence force merging this. |
What does this PR do?
It ensures the
statePath
set by thecontainer
command will always generate an Unix socket path that is smaller than 108 characters. This ensures the Unix socket can be created and the Elastic-Agent always successfully starts.Why is it important?
Unix socket paths need to be less than 104 characters long, if they're not Elastic-Agent will fail to create a socket and will not start. When it fails it logs those messages:
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files./changelog/fragments
using the changelog toolDisruptive User Impact
The Elastic-Agent container command will not always respect the
STATE_PATH
environment variable. In cases where it would lead to a Unix socket path too long, which prevents the Elastic-Agent from starting, the default path/usr/share/elastic-agent/state
is used.How to test this PR locally
FLEET_ENROLL
FLEET_URL
FLEET_ENROLLMENT_TOKEN
STATE_PATH
to be longer than 100 charactersi. The Elastic-Agent runs and can communicate with Fleet
ii. The state path is the default path :
/usr/share/elastic-agent/state
Another option is to run the integration test
TestContainerCMDWithAVeryLongStatePath
fromtesting/integration/container_cmd_test.go
## Related issuesQuestions to ask yourself